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Abstract. Principal component analysis (PCA) requires the computation of a low-rank approx- 
imation to a matrix containing the data being analyzed. In many applications of PCA, the best 
possible accuracy of any rank-deficient approximation is at most a few digits (measured in the spec- 
tral norm, relative to the spectral norm of the matrix being approximated). In such circumstances, 
efficient algorithms have not come with guarantees of good accuracy, unless one or both dimensions 
of the matrix being approximated are small. We describe an efficient algorithm for the low-rank 
approximation of matrices that produces accuracy very close to the best possible, for matrices of 
arbitrary sizes. We illustrate our theoretical results via several numerical examples. 
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1. Introduction. Principal component analysis (PCA) is among the most widely 
used techniques in statistics, data analysis, and data mining. PCA is the basis of many 
machine learning methods, including the latent semantic analysis of large databases of 
text and HTML documents described in [6]. Computationally, PCA amounts to the 
low-rank approximation of a matrix containing the data being analyzed. The present 
article describes an algorithm for the low-rank approximation of matrices, suitable for 
PCA. This paper demonstrates both theoretically and via numerical examples that 
the algorithm efficiently produces low-rank approximations whose accuracies are very 
close to the best possible. 

The canonical construction of the best possible rank-fc approximation to a real 
m X n matrix A uses the singular value decomposition (SVD) of A, 

A^UT,V^, (1.1) 

where C/ is a real unitary m x m matrix, V is a real unitary n x n matrix, and E is 
a real mx n matrix whose only nonzero entries appear in nonincreasing order on the 
diagonal and are nonnegative. The diagonal entries cti, (72, ■ • • , o'min(m,n)-ii o'min(m.n) 
of E are known as the singular values of A. The best rank-A; approximation to A, 
with k < m and fc < n, is 

AfnUT.V'^, (1.2) 

where U is the leftmost m x k block of U, V is the leftmost nx k block of V , and E 
is the k X k matrix whose only nonzero entries appear in nonincreasing order on the 
diagonal and are the k greatest singular values of A. This approximation is "best" in 
the sense that the spectral norm || A — _B|| of the difference between A and a rank-fc 
matrix B is minimal for i? = {/ E V"^ . In fact, 

\\A-UtV^\\=ak+u (1.3) 
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where a^+i is the {k + 1)^*' greatest singular value of A. For more information about 
the SVD, see, for example. Chapter 8 in [20]. 

For definiteness, let us assume that m < n and that A is an arbitrary (dense) real 
mxn matrix. To compute a rank-A; approximation to A, one might form the matrices 
U, S, and V in (1.1), and then use them to construct U, E, and V in (1.2). However, 
even computing just E, the leftmost m columns of U, and the leftmost m columns of 
V requires at least 0{nm?) floating-point operations (flops) using any of the standard 
algorithms (see, for example;. Chapter 8 in [20]). Alternatively, one might use pivoted 
(5i?-decomposition algorithms, which require 0{nmk) flops and typically produce a 
rank-A; approximation B to A such that 

\\A-B\\<lOV^ak+i, (1.4) 

where — is the spectral norm of A — B, and ak+i is the (A:+l)''* greatest singular 
value of A (see, for example, Chapter 5 in [20]). Furthermore, the algorithms of [24] 
require only about 0{nmk) flops to produce a rank-Zc approximation that (unlike 
an approximation produced by a pivoted Q-R-decomposition) has been guaranteed to 
satisfy a bound nearly as strong as (1.4). 

While the accuracy in (1.4) is sufficient for many applications of low-rank approx- 
imation, PC A often involves m > 10,000, and a "signal-to-noise ratio" ai/ak+i < 100, 
where cti = ||^|| is the greatest singular value of A, and ak+i is the (/;; + 1)'** greatest. 
Moreover, the singular values < a^+i often arise from noise in the process generating 
the data in A, making the singular values of A decay so slowly that am > Cfe+i/10. 
When m > 10,000, ai/ak+i < 100, and am > ak+i/W, the rank-fc approximation B 
produced by a pivoted Qi?-dccomposition algorithm typically satisfies HA — i?]] ^ \\A\\ 
— the "approximation" B is effectively unrelated to the matrix A being approximated! 
For large matrices whose "signal-to-noise ratio" a\/ak+i is less than 10,000, the y/m 
factor in (1.4) may be unacceptable. Now, pivoted Qi^-docomposition algorithms are 
not the only algorithms which can compute a rank-fc approximation using 0{nmk) 
flops. However, other algorithms, such as those of [1], [2], [3], [5], [7], [8], [10], [11], 
[12], [13], [14], [15], [16], [17], [18], [21], [22], [23], [24], [25], [27], [28], [30], [32], [33], 
[34], [35], and [37], also yield accuracies involving factors of at least ^/m when the 
singular values ak+i, ak+2, o^k+a, ... of A decay slowly. (The decay is rather slow if, 
for example, ak+j ^ j" ak+i for j = 1, 2, 3, . . . , with — 1/2 < a < 0. Many of these 
other algorithms are designed to produce approximations having special properties 
not treated in the present paper, and their spectral-norm accuracy is good when the 
singular values decay sufficiently fast. Fairly recent surveys of algorithms for low-rank 
approximation are available in [32], [33], and [27].) 

The algorithm described in the present paper produces a rank-fc approximation 
B to A such that 

\\A-B\\<C mi/(^'+2) CTfe+i (1.5) 

with very high probability (typically 1 — 10^^^, independent of A, with the choice of 
parameters from Remark 4.2 below), where \\A — B\\ is the spectral norm of A — B, i 
is a nonnegative integer specified by the user, ak+i is the (fc -I- 1)"'' greatest singular 
value of A, and C is a constant independent of A that theoretically may depend 
on the parameters of the algorithm. (Numerical evidence such as that in Section 5 
suggests at the very least that C < 10; (4.23) and (4.10) in Section 4 provide more 
complicated theoretical bounds on C.) The algorithm requires 0{nmki) floating-point 
operations when i > 0. In many applications of PCA, z = 1 or i = 2 is sufficient, and 
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the algorithm then requires only 0{nmk) flops. The algorithm provides the rank-fc 
approximation B in the form of an SVD, outputting three matrices, U, S, and V, 
such that B = tJT, V^, where the columns of U are orthonormal, the columns of V 
are orthonormal, and the entries of S are all nonnegative and zero off the diagonal. 

The algorithm of the present paper is randomized, but succeeds with very high 
probability; for example, the bound (4.23) on its accuracy holds with probability 
greater than 1 — 10~^^. The algorithm is similar to many recently discussed random- 
ized algorithms for low-rank approximation, but produces approximations of higher 
accuracy when the singular values CTfc+i, crfe+2, crfc+3, ... of the matrix being approx- 
imated decay slowly; see, for example, [32] or [27]. The algorithm is a variant of that 
in [31], and the analysis of the present paper should extend to the algorithm of [31]; 
[31] stimulated the authors' collaboration. The algorithm may be regarded as a gen- 
eralization of the randomized power methods of [9] and [26], and in fact we use the 
latter to ascertain the approximations' accuracy rapidly and reliably. 

The algorithm admits obvious "out-of-core" and parallel implementations (assum- 
ing that the user chooses the parameter i in (1.5) to be reasonably small). As with 
the algorithms of [9], [26], [27], [29], [31], [32], and [33], the core steps of the algorithm 
of the present paper involve the application of the matrix A being approximated and 
its transpose to random vectors. The algorithm is more efficient when A and A'^ 
can be applied rapidly to arbitrary vectors, such as when A is sparse. 

Throughout the present paper, we use 1 to denote an identity matrix. We use 
to denote a matrix whose entries are all zeros. For any matrix A, we use || j4|| to denote 
the spectral norm of A, that is, \\A\\ is the greatest singular value of A. Furthermore, 
the entries of all matrices in the present paper are real valued, though the algorithm 
and analysis extend trivially to matrices whose entries are complex valued. 

The present paper has the following structure: Section 2 collects together various 
known facts which later sections utilize. Section 3 provides the principal lemmas 
used in bounding the accuracy of the algorithm in Section 4. Section 4 describes the 
algorithm of the present paper. Section 5 illustrates the performance of the algorithm 
via several numerical examples. The appendix, Section 6, proves two lemmas stated 
earlier in Section 3. We encourage the reader to begin with Sections 4 and 5, referring 
back to the relevant portions of Sections 2 and 3 as they are referenced. 

2. Preliminaries. In this section, we summarize various facts about matrices 

and functions. Subsection 2.1 discusses the singular values of arbitrary matrices. 
Subsection 2.2 discusses the singular values of certain random matrices. Subsection 2.3 
observes that a certain function is monotone. 

2.1. Singular values of general matrices. The following trivial technical 
lemma will be needed in Section 3. 

Lemma 2.1. Suppose that m and n are positive integers with m > n. Suppose 

further that A is a real m x n matrix such that the least (that is, the n*'* greatest) 
singular value an of A is nonzero. 

Then, 

\UA^A)-^A'^\\ = —. (2.1) 

The following lemma states that the greatest singular value of a matrix A is at 
least as large as the greatest singular value of any rectangular block of entries in A; 



4 



ROKHLIN, SZLAM, AND TYGERT 



the lemma is a straightforward consequence of the minimax properties of singular 
values (sec, for example, Section 47 of Chapter 2 in [36]). 

Lemma 2.2. Suppose that k, I, m, and n are positive integers with k < m and 
I < n. Suppose further that A is a real m x n matrix, and B is a k x I rectangular 
block of entries in A. 

Then, the greatest singular value of B is at most the greatest singular value of 

A. 

The following classical lemma provides an approximation Q to an n x Z matrix 
R via an n X A; matrix Q whose columns are orthonormal, and a. k x I matrix S. As 
remarked in Observation 2.4, the proof of this lemma provides a classic algorithm 
for computing Q and S, given R. We include the proof since we will be using this 
algorithm. 

Lemma 2.3. Suppose that k, I, and n are positive integers with k < I < n, and 
R is a real n x I matrix. 

Then, there exist a real n x k matrix Q whose columns are orthonormal, and a 
real k x I matrix S, such that 

\\Q S - R\\ < pk+u (2.2) 

where pu+i is the (k + I)'** greatest singular value of R. 
Proof. We start by forming an SVD of R, 

R = UY.V'^, (2.3) 

where U is s. real n x I matrix whose columns are orthonormal, ^ is a real / x I matrix 
whose columns are orthonormal, and S is a real diagonal I x I matrix, such that 

= (2.4) 

for j = 1,2, — 1,/, where S^j is the entry in row j and column j of S, and pj 

is the j^^ greatest singular value of R. We define Q to be the leftmost nxk block of 
U, and P to be the rightmost n x {I — k) block of U, so that 

U={Q\P). (2.5) 

We define S to be the uppermost k x I block of J^V'^, and T to be the lowermost 
{l-k)xl block of S F"^, so that 

(2-6) 

Combining (2.3), (2.4), (2.5), (2.6), and the fact that the columns of U are orthonor- 
mal, as are the columns of V, yields (2.2). □ 

Observation 2.4. In order to compute the matrices Q and S in (2.2) from 
the matrix R, we can construct (2.3), and then form Q and S according to (2.5) 
and (2.6). (See, for example, Chapter 8 in [20] for details concerning the computation 
of the SVD.) 

2.2. Singular values of random matrices. The following lemma provides a 
highly probable upper bound on the greatest singular value of a square matrix whose 
entries are independent, identically distributed (i.i.d.) Gaussian random variables of 
zero mean and unit variance; Formula 8.8 in [19] provides an equivalent formulation 
of the lemma. 
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Lemma 2.5. Suppose that n is a positive integer, G is a real nxn matrix whose 
entries are i.i.d. Gaussian random variables of zero mean and unit variance, and-j is 
a positive real number, such that 7 > 1 and 



1 / 27 



4(72- Ve'^ 



(2.7) 



is nonnegative. 

Then, the greatest singular value of G is at most y/2n^ with probability not less 
than the amount in (2.7). 

Combining Lemmas 2.2 and 2.5 yields the following lemma, providing a highly 
probable upper bound on the greatest singular value of a rectangular matrix whose 
entries are i.i.d. Gaussian random variables of zero mean and unit variance. 

Lemma 2.6. Suppose that I, m, and n are positive integers with n > I and n> m. 
Suppose further that G is a real I x m matrix whose entries are i.i.d. Gaussian random 
variables of zero mean and unit variance, and j is a positive real number, such that 
7 > 1 and (2.7) is nonnegative. 

Then, the greatest singular value of G is at most V2n^ with probability not less 
than the amount in (2.7). 

The following lemma provides a highly probable lower bound on the least singular 
value of a rectangular matrix whose entries arc i.i.d. Gaussian random variables of 
zero mean and unit variance; Formula 2.5 in [4] and the proof of Lemma 4.1 in [4] 
together provide an equivalent formulation of Lemma 2.7. 

Lemma 2.7. Suppose that j and I are positive integers with ) < I. Suppose further 
that G is a real I x j matrix whose entries are i.i.d. Gaussian random variables of 
zero mean and unit variance, and (5 is a positive real number, such that 



I / „ \ '-J+i 



V27r(l-j + l) \{l-j + l)f3 



(2.8) 



is nonnegative. 

Then, the least (that is, the j*'* greatest) singular value of G is at least l/iVl P) 
with probability not less than the amount in (2.8). 

2.3. A monotone function. The following technical lemma will be needed in 
Section 4. 

Lemma 2.8. Suppose that a is a nonnegative real number, and f is the function 
defined on (0, 00) via the formula 

/(.) 1 (^)^ (2.9) 

Then, f decreases monotonically for x> a. 
Proof The derivative of / is 

/'(.)=/(.)(lng)-^) (2.10) 



for any positive real number x. The right-hand side of (2.10) is negative when x > a. 
□ 
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3. Mathematical apparatus. In this section, we provide lemmas to be used 
in Section 4 in bounding the accuracy of the algorithm of the present paper. 

The following lemma, proven in the appendix (Section 6), shows that the product 
A Q Q'^ of matrices A, Q, and is a good approximation to a matrix A, provided 
that there exist matrices G and S such that 

1. the columns of Q are orthonormal, 

2. Q S is a good approximation to (G {A A'^Y A)^, and 

3. there exists a matrix F such that \\F\\ is not too large, and FG {AA'^y A is 
a good approximation to A. 

Lemma 3.1. Suppose that i, k, I, m, and n are positive integers with k < I < 
m < n. Suppose further that A is a real mxn matrix, Q is a real nx k m,atrix whose 
columns are orthonormal, S is a real k x I matrix, F is a real m x I matrix, and G is 
a real I x m matrix. 



\\AQQ^-A\\<2\\FG{AA^yA-A\\+2\\F\\\\QS-{G{AA^yAf\\. (3.1) 



The following lemma, proven in the appendix (Section 6), states that, for any 
positive integer i, matrix A, and matrix G whose entries are i.i.d. Gaussian random 
variables of zero mean and unit variance, with very high probability there exists a 
matrix F with a reasonably small norm, such that F G {AA^Y A is a good approxi- 
mation to A. This lemma is similar to Lemma 19 of [29]. 

Lemma 3.2. Suppose that i, j, k, I, m, and n are positive integers with j < k < 
I < m < n. Suppose further that A is a real mxn matrix, G is a real I x m matrix 
whose entries are i.i.d. Gaussian random variables of zero mean and, unit variance, 
and (3 and 7 are positive real numbers, such that the j*'' greatest singular value aj of 
A is positive, 7 > 1, and 



Then, 





2 \ max(m— fe, I) 



is nonnegative. 

Then, there exists a real m x I matrix F such that 



\\FG{AA^yA-A\\ < V2/2/3272 + 1 aj+i 




and 




Vip 



(3.4) 
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with probability not less than $ defined in (3.2), where aj is the j"' greatest singular 
value of A, (Jj+i is the {j + 1)** greatest singular value of A, and cJk+i is the (fc + 1)** 
greatest singular value of A. 

Given a matrix A, and a matrix G whose entries are i.i.d. Gaussian random vari- 
ables of zero mean and unit variance, the following lemma provides a highly probable 
upper bound on the singular values of the product G A in terms of the singular values 
of A. This lemma is reproduced from [29], where it appears as Lemma 20. 

Lemma 3.3. Suppose that j, k, I, m, and n are positive integers with k < I, such 
that k + j < m and k + j < n. Suppose further that A is a real m x n matrix, G is 
a real I x m matrix whose entries are i.i.d. Gaussian random variables of zero mean 
and unit variance, and 'f is a positive real number, such that 7 > 1 and 



S = 1 - 



1 / 2 2 ^ max(m — fc— J, /) 



4(72 - 1) -^TT max(m- /c-j, 1)72 Ve'^^ ^ 

1 f 27^ 



4 (72 - 1) ^TT max(A: + j, I) 7^ 



2 I prr-i 



(3.5) 



is nonnegative. 
Then, 



Pk+i < ^/2 max(fc + j, I) 7 C7fe+i + ^2 max(m - k-j,l) 7 (Tk+j+i (3.6) 

with probability not less than S defined in (3.5), where pk+i is the {k + 1)** greatest 
singular value of G A, a^+i is the {k + 1)** greatest singular value of A, and ak+j+i 
is the [k + j + 1)"' greatest singular value of A. 

The following corollary follows immediately from the preceding lemma, by re- 
placing the matrix A with {AA'^yA, the integer k with j, and the integer j with 
k-j. 

Corollary 3.4. Suppose i, j, k, I, m, and n are positive integers with j < k < 
I < m < n. Suppose further that A is a real m x n matrix, G is a real I x m matrix 
whose entries are i.i.d. Gaussian random variables of zero mean and unit variance, 
and is a positive real number, such that 7 > 1 and 



* = 1 



4(72 - 1) yVmax(m^^"fc70^ Ve"^ ^ 



2^ 2 \ max(m— fe, I) 



4(72-1) yVl^ Ve^ 



is nonnegative. 
Then, 

Pj+i <V2l^ + v/2 max(m-A;,Z) 7 iak+if'+^ (3.8) 



with probability not less than 4* defined in (3.7), where Pj+i is the {j + I)''* greatest 
singular value of G {AA'^Y A, (Jj+i is the [j + 1)** greatest singular value of A, and 
cTfc+i is the (k + 1)"* greatest singular value of A. 
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4. Description of the algorithm. In this section, we describe the algorithm 

of the present paper, providing details about its accuracy and computational costs. 
Subsection 4.1 describes the basic algorithm. Subsection 4.2 tabulates the computa- 
tional costs of the algorithm. Subsection 4.3 describes a complementary algorithm. 
Subsection 4.4 describes a computationally more expensive variant that is somewhat 
more accurate and tolerant to roundoff. 

4.1. The algorithm. Suppose that i, k, m, and n are positive integers with 
2k < m < n, and A is a real m x n matrix. In this subsection, we will construct an 
approximation to an SVD of A such that 

\\A-Ui:V^\\<Cm^/^^'+^Kk+i (4.1) 

with very high probability, where Z7 is a real mxk matrix whose columns are orthonor- 
mal, y is a real n x k matrix whose columns are orthonormal, S is a real diagonal 
kx k matrix whose entries are all nonnegative, c^+i is the {k + ly^ greatest singular 
value of A, and C is a constant independent of A that depends on the parameters of 
the algorithm. (Section 5 will give an empirical indication of the size of C, and (4.23) 
will give one of our best theoretical estimates to date.) 

Intuitively, we could apply A'^ to several random vectors, in order to identify the 
part of its range corresponding to the larger singular values. To enhance the decay 
of the singular values, we apply A'^ {AA'^y instead. Once we have identified most of 
the range of A'^ , we perform several linear-algebraic manipulations in order to recover 
an approximation to A. (It is possible to obtain a similar, somewhat less accurate 
algorithm by substituting our short, fat matrix A for A'^ , and A'^ for ^4.) 

More precisely, we choose an integer I > k such that I < m — k (for example, 
I = k + 12), and make the following five steps: 

1. Using a random number generator, form a real / x m matrix G whose entries 
are i.i.d. Gaussian random variables of zero mean and unit variance, and 
compute the I x n product matrix 

R = G{AA^Y A. (4.2) 

2. Using an SVD, form a real n x k matrix Q whose columns are orthonormal, 
such that there exists a real k x I matrix 5 for which 

\\QS-R^\\<Pk+i, (4.3) 

where pk+i is the (fc + I)''* greatest singular value of R. (See Observation 2.4 
for details concerning the construction of such a matrix Q.) 

3. Compute the mxk product matrix 

T = AQ. (4.4) 

4. Form an SVD of T, 

T=C/EVF^, (4.5) 

where U is a real mxk matrix whose columns arc orthonormal, W is a real 
k X k matrix whose columns are orthonormal, and E is a real diagonal k x k 
matrix whose entries are all nonnegative. (Sec, for example. Chapter 8 in [20] 
for details concerning the construction of such an SVD.) 
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5. Compute the n x k product matrix 

V = QW. (4.6) 

The following theorem states precisely that the matrices U, E, and V satisfy (4.1). 
See (4.23) for a more compact (but less general) formulation. 

Theorem 4.1. Suppose that i, k, I, m, and n are positive integers with k < I < 
m — k and m < n, and A is a real m x n matrix. Suppose further that (3 and 7 are 
positive real numbers such that 7 > 1, 

{l-k + l)P>l, (4.7) 



and 



272 X""-' 1 / 272 



2(72-l)^7r(m-A;)72 \e^'-^ J 2 {-y^ - 1) 

k+l 

(4.9) 



n = 1- 



V27r (l-k + l) \{l-k + l)f3 

is nonnegative. Suppose in addition that U , S, and V are the matrices produced via 
the five-step algorithm of the present subsection, given above. 
Then, 

\\A-UY.V^\\<l&^l3ir^^\ cjk+i (4.10) 

with probability not less than 11, where H is defined in (4-9), and ak+i is the (k + 1)** 
greatest singular value of A. 

Proof. Observing that U S V'^ = AQ Q^, it is sufficient to prove that 

\\AQQ^ -A\\< 16 J (31 i^^^j afe+i (4.11) 

with probability H, where Q is the matrix from (4.3), since combining (4.11), (4.4), 
(4.5), and (4.6) yields (4.10). We now prove (4.11). 
First, we consider the case when 



\\A\\ < ^ J (4.12) 
Clearly, 

||AggT-A||<||A||||Q||||Ql + |H|. (4.13) 

But, it follows from the fact that the columns of Q are orthonormal that 

IIOII < 1 (4.14) 

and 

IIQ^II < 1. (4.15) 
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Combining (4.13), (4.14), (4.15), (4.12), and (4.8) yields (4.11), completing the proof 
for the case when (4.12) holds. 

For the remainder of the proof, we consider the case when 



k 



l/(4i+2) 



To prove (4.11), we will use (3.1) (which is restated and proven in Lemma 6.1 in the 
appendix), namely, 

\\AQQ'^-A\\ <2\\FG{AA'^yA~A\\+2\\F\\ \\Q S - {G {AA'^y Af\\ (4.17) 

for any real m x I matrix F, where G is from (4.2), and Q and S are from (4.3). We 
now choose an appropriate matrix F. 

First, we define j to be the positive integer such that 

/,„_fc\l/(4i+2) 

aj+i < ( — ^fc+i < (4-18) 

where aj is the j''^ greatest singular value of A, and Uj+i is the (j + 1)*** greatest (such 
an integer j exists due to (4.16) and the supposition of the theorem that I < m — k). 
We then use the matrix F from (3.3) and (3.4) associated with this integer j, so that 
(as stated in (3.3) and (3.4), which are restated and proven in Lemma 6.2 in the 
appendix) 

\\FG{AA^yA-A\\ < ^fWW^F^ (^i+i 



+ W 2Z max(m - k, I) /J^ 72 ( ^ ] + 1 afe+i (4.19) 



and 



^11 < h^i (4-20) 



with probability not less than $ defined in (3.2). Formula (4.19) bounds the first 
term in the right-hand side of (4.17). 

To bound the second term in the right-hand side of (4.17), we observe that j < A:, 
due to (4.18) and the supposition of the theorem that I < m — k. Combining (4.3), 
(4.2), (3.8), and the fact that j <k yields 

\\QS-{G{AA^yAf\\ < V2/7 (c7,-+i)2^+i + v/2 max(m- fc,/) 7 {ak+if'+^ (4.21) 
with probability not less than ^ defined in (3.7). Combining (4.20) and (4.21) yields 

lli^ll WQS- {GiAA-^yAfW < x/2]V^^,+i 



+ y2/ max(m-fc, 2)72/32 (^^^ ak+i (4.22) 

with probability not less than 11 defined in (4.9). The combination of Lemma 2.8, 
(4.7), and the fact that j < k justifies the use of k (rather than the j used in (3.2) for 
$) in the last term in the right-hand side of (4.9). 
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Combining (4.17), (4.19), (4.22), (4.18), (4.8), and the supposition of the theorem 
that I < m — k yields (4.11), completing the proof. □ 

Remark 4.2. Choosing I = k + 12, P = 2.57, and 7 = 2.43 in (4.9) and (4.10) 
yields 

||A-f/SyT|| < loo; /I^L-^ J CTfc+i (4.23) 

with probability greater than 1 — 10~^^, where ak+i is the (A: + I)*'* greatest singular 
value of A. Numerical experiments (some of which are reported in Section 5) indicate 
that the factor 100? in the right-hand side of (4.23) is much greater than necessary. 

Remark 4.3. Above, we permit I to be any integer greater than k. Stronger 
theoretical bounds on the accuracy are available when I > 2k. Indeed, via an analysis 
similar to the proof of Theorem 4.1 (using in addition the result stated in the abstract 
of [4]), it can be shown that the following six-step algorithm with / > 2k produces 
matrices U, S, and V satisfying the bound (4.10) with its right-hand side reduced by 
a factor of 

1. Using a random number generator, form a real I x m matrix G whose entries 
are i.i.d. Gaussian random variables of zero mean and unit variance, and 
compute the / x n product matrix 

R = G(AA'^y A. (4.24) 

2. Using a pivoted Qii-decomposition algorithm, form a real n x I matrix Q 
whose columns are orthonormal, such that there exists a real I x I matrix S 
for which 

R^ = QS. (4.25) 

(See, for example, Chapter 5 in [20] for details concerning the construction 
of such a matrix Q.) 

3. Compute the m x I product matrix 

T = AQ. (4.26) 

4. Form an SVD of T, 

T=Utw'^, (4.27) 

where [/ is a real m x / matrix whose columns arc orthonormal, is a real I x I 
matrix whose columns arc orthonormal, and E is a real diagonal / x I matrix 
whose only nonzero entries are nonnegative and appear in nonincreasing order 
on the diagonal. (See, for example. Chapter 8 in [20] for details concerning 
the construction of such an SVD.) 

5. Compute the n x I product matrix 

V = QW. (4.28) 

6. Extract the leftmost m x k block U of U, the leftmost n x k block V of V, 
and the leftmost uppermost k x k block E of E. 
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4.2. Computational costs. In this subsection, we tabulate the number of 
floating-point operations rcqnircd by the five-step algorithm described in Subsec- 
tion 4.1 as applied once to a matrix A. 

The algorithm incurs the following costs in order to compute an approximation 
to an SVD of A: 

1. Forming R in (4.2) requires applying A to il column vectors, and A'^ to (i+1) I 
column vectors. 

2. Computing Q in (4.3) costs OQ'^n). 

3. Forming T in (4.4) requires applying A to k column vectors. 

4. Computing the SVD (4.5) of T costs 0{P m). 

5. Forming V in (4.6) costs 0{k'^ n). 

Summing up the costs in Steps 1-5 above, and using the fact that k < I < m < n, we 
conclude that the algorithm of Subsection 4.1 costs 

CpcA = {il + k)-CA + (il + I) ■ Ca-t + 0{l^ n) (4.29) 

floating-point operations, where Ca is the cost of applying ^ to a real n x 1 column 
vector, and Cat is the cost of applying A^ to a real m x 1 column vector. 

Remark 4.4. We observe that the algorithm only requires applying A to il + k 
vectors and A"^ to il + I vectors; it does not require explicit access to the individual 
entries of A. This consideration can be important when A and A^ are available 
solely in the form of procedures for their applications to arbitrary vectors. Often such 
procedures for applying A and A^ cost much less than the standard procedure for 
applying a dense matrix to a vector. 

4.3. A modified algorithm. In this subsection, we describe a simple modifi- 
cation of the algorithm described in Subsection 4.1. Again, suppose that i, k, I, m, 

and n are positive integers with k < I < m — k and m < n, and A is a real m x n 
matrix. Then, the following five-step algorithm constructs an approximation to an 
SVD of A^ such that 

||AT-C/EI^T|| <Cmi/(4^)(7fe+i (4.30) 

with very high probability, where [/ is a real nxk matrix whose columns are orthonor- 
mal, y is a real m x k matrix whose columns are orthonormal, S is a real diagonal 
kx k matrix whose entries are all nonnegative, ak+i is the (fc -|- l)**' greatest singular 
value of A, and C is a constant independent of A that depends on the parameters of 
the algorithm: 

1. Using a random number generator, form a real I x m matrix G whose entries 
are i.i.d. Gaussian random variables of zero mean and unit variance, and 
compute the I x m product matrix 

R = G{AA'^y. (4.31) 

2. Using an SVD, form a real m x k matrix Q whose columns are orthonormal, 
such that there exists a real k x I matrix S for which 

\\QS-R'^\\<Pk+i, (4.32) 

where pk+i is the {k + I)'** greatest singular value of R. (See Observation 2.4 
for details concerning the construction of such a matrix Q.) 
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3. Compute the n x k product matrix 

T = A'^Q. 



4. Form an SVD of T, 



T = UEW'^, 



(4.33) 



(4.34) 



where J7 is a real n x k matrix whose columns are orthonormal, W is a real 
k X k matrix whose columns are orthonormal, and E is a real diagonal k x k 
matrix whose entries are all nonnegative. (See, for example. Chapter 8 in [20] 
for details concerning the construction of such an SVD.) 
Compute the mx k product matrix 



V = QW. 



(4.35) 



Clearly, (4.30) is similar to (4.1), as (4.31) is similar to (4.2). 
Remark 4.5. The ideas of Remark 4.3 are obviously relevant to the algorithm 
of the present subsection, too. 

4.4. Blanczos. In this subsection, we describe a modification of the algorithm 
of Subsection 4.1, enhancing the accuracy at a little extra computational expense. 
Suppose that i, k, I, m, and n are positive integers with k < I and {i + 1)1 < m — k, 
and ^ is a real mxn matrix, such that m < n. Then, the following five-step algorithm 
constructs an approximation UT,V^ to an SVD of A: 

1. Using a random number generator, form a real I x m matrix G whose entries 
are i.i.d. Gaussian random variables of zero mean and unit variance, and 
compute the I x n matrices R^^^ , R^^^ , • • • , R^^~^^ , i?^'^ defined via the formulae 



i?W = GA, 



(4.36) 



=i?(i) A^A, 



(4.37) 
(4.38) 



Rd) = rH-^) A. 
Form the {{i + 1)1) x n matrix 



RW 



R = 



Ri^'-l) 



(4.39) 
(4.40) 



(4.41) 
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2. Using a pivoted QiJ-decomposition algorithm, form a real n x + matrix 
Q whose columns arc orthonormal, such that there exists a real ((i + 1)1) x 
{{i + l)l) matrix S for which 

= QS. (4.42) 

(Sec, for example. Chapter 5 in [20] for details concerning the construction 
of such a matrix Q.) 

3. Compute the m x + 1)1) product matrix 

T = AQ. (4.43) 

4. Form an SVD of T, 

r=[/STy^, (4.44) 

where U is a, real m x ((z + 1)^) matrix whose columns are orthonormal, W is 
a real ((i + 1)1) x {{i + 1)1) matrix whose columns are orthonormal, and E is 
a real diagonal {{i + 1)1) x {{i + 1)1) matrix whose entries are all nonnegative. 
(See, for example, Chapter 8 in [20] for details concerning the construction 
of such an SVD.) 

5. Compute the nx + 1)1) product matrix 

V = QW. (4.45) 

An analysis similar to the proof of Theorem 4.1 above shows that the matrices U, 

S, and V produced by the algorithm of the present subsection satisfy the same upper 
bounds (4.10) and (4.23) as the matrices produced by the algorithm of Subsection 4.1. 
If desired, one may produce a similarly accurate rank-fc approximation by arranging 
U, S, and V such that the diagonal entries of E appear in nonincrcasing order, and 
then discarding all but the leftmost k columns of U and all but the leftmost k columns 
of V, and retaining only the leftmost uppermost k x k block of S. We will refer to 
the algorithm of the present subsection as "blanczos," due to its similarity with the 
block Lanczos method (see, for example. Subsection 9.2.6 in [20] for a description of 
the block Lanczos method). 

5. Numerical results. In this section, we illustrate the performance of the 

algorithm of the present paper via several numerical examples. 

We use the algorithm to construct a rank-A; approximation, with k = 10, to the 
m x (2m) matrix A defined via its singular value decomposition 

A = f7(^) S(^) (y(^))^, (5.1) 

where U^"^^ is an m x m Hadamard matrix (a unitary matrix whose entries are all 
±l/^/rn), V^^^ is a (2m) x (2m) Hadamard matrix, and S^^^ is an to x (2to) matrix 
whose entries are zero off the main diagonal, and whose diagonal entries are defined 
in terms of the {k + 1)^^ singular value (ik+i via the formulae 

s(j)=a, = (a.+i)L^'/^J/^ (5.2) 

for j = 1, 2, . . . , 9, 10, where [j/2j is the greatest integer less than or equal to j/2, 
and 
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for j = 11, 12, . . . , m — 1, TO. Thus, ui = 1 and ai~ = ak+i (recall that k = 10). We 
always choose ak+i < 1, so that cri > cr2 > ■ ■ ■ > cTm-i ^ <^m- 

Figure 1 plots the singular values ai, a2, ■ ■ ■ , cTm-i, f^m of A with m = 512 and 
cTfc+i = .001; these parameters correspond to the first row of numbers in Table 1, the 
first row of numbers in Tabic 2, and the first row of numbers in Tabic 6. 

Table 1 reports the results of applying the five-step algorithm of Subsection 4.1 
to matrices of various sizes, with i = 1. Table 2 reports the results of applying the 
five-step algorithm of Subsection 4.1 to matrices of various sizes, with i = 0. The 
algorithms of [32], [33], and [27] for low-rank approximation are essentially the same 
as the algorithm used for Table 2 (with i = 0). 

Table 3 reports the results of applying the five-step algorithms of Subsections 4.1 
and 4.3 with varying numbers of iterations i. Rows in the table where i is enclosed 
in parentheses correspond to the algorithm of Subsection 4.3; rows where i is not 
enclosed in parentheses correspond to the algorithm of Subsection 4.1. 

Table 4 reports the results of applying the five-step algorithm of Subsection 4.1 to 
matrices whose best rank-fc approximations have varying accuracies. Table 5 reports 
the; results of applying the blanczos algorithm of Subsection 4.4 to matrices whose 
best rank-fc approximations have varying accuracies. 

Table 6 reports the results of calculating pivoted (5i?-decompositions, via plane 
(Householder) reflections, of matrices of various sizes. We computed the pivoted 
QiJ-decomposition of the transpose of A defined in (5.1), rather than of A itself, 
for reasons of accuracy and efficiency. As pivoted Qi?-decomposition requires dense 
matrix arithmetic, our 1 GB of random-access memory (RAM) imposed the limit 
TO < 4096 for Table 6. 

The headings of the tables have the following meanings: 

• TO is the number of rows in A, the matrix being approximated. 

• n is the number of columns in A, the matrix being approximated. 

• i is the integer parameter used in the algorithms of Subsections 4.1, 4.3, 
and 4.4. Rows in the tables where i is enclosed in parentheses correspond to 
the algorithm of Subsection 4.3; rows where i is not enclosed in parentheses 
correspond to either the algorithm of Subsection 4.1 or that of Subsection 4.4. 

• tis the time in seconds required by the algorithm to create an approximation 
and compute its accuracy 5. 

• (Tfe+i is the {k + ly^ greatest singular value of A, the matrix being approx- 
imated; cTfe+i is also the accuracy of the best possible rank-fc approximation 
to A. 

• 5 ]s the accuracy of the approximation UY^V^ (or (QRP)^, for Table 6) 
constructed by the algorithm. For Tables 1-5, 



where U is an m x k matrix whose columns are orthonormal, y is an n x & 
matrix whose columns are orthonormal, and S is a diagonal k x k matrix 
whose entries are all nonnegative; for Table 6, 



where P is an m x to permutation matrix, R \s a. k x m upper-triangular 
(meaning upper-trapezoidal) matrix, and Q is an n x fc matrix whose columns 
are orthonormal. 



5 = \\A-UYV^\\ 



(5.4) 



5=\\A- {QRPf 



(5.5) 
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The values for t are the average values over 3 independent randomized trials of 
the algorithm. The values for S arc the worst (maximum) values encountered in 3 
independent randomized trials of the algorithm. The values for S in each trial are 
those produced by 20 iterations of the power method applied to ^ — ?7 S V'^ (or 
A — (QRP)'^ , for Table 6), started with a vector whose entries arc i.i.d. centered 
Gaussian random variables. The theorems of [9] and [26] guarantee that this power 
method produces accurate results with overwhelmingly high probability. 

We performed all computations using IEEE standard double- precision variables, 

whose mantissas have approximately one bit of precision less than 16 digits (so that 
the relative precision of the variables is approximately .2E-15). We ran all com- 
putations on one core of a 1.86 GHz Intel Centrino Core Duo microprocessor with 
2 MB of L2 cache and 1 GB of RAM. We compiled the Fortran 77 code using the 
Lahey/Fujitsu Linux Express v6.2 compiler, with the optimization flag — o2 enabled. 
We implemented a fast Walsh-Hadamard transform to apply rapidly the Hadamard 
matrices C/*-'^) and V^"^^ in (5.1). We used plane (Householder) reflections to compute 
all pivoted QiJ-decompositions. We used the LAPACK 3.1.1 divide-and-conquer SVD 
routine dgesdd to compute all full SVDs. For the parameter I, we set I = 12 {= k + 2) 
for all of the examples reported here. 

The experiments reported here and our further tests point to the following: 

1. The accuracies in Table 1 are superior to those in Table 2; the algorithm 
performs much better with i > 0. (The algorithms of [27], [32], and [33] for 
low-rank approximation are essentially the same as the algorithm used for 
Tables 1 and 2 when i = 0.) 

2. The accuracies in Table 1 are superior to the corresponding accuracies in 
Table 6; the algorithm of the present paper produces higher accuracy than 
the classical pivoted <5i?-decompositions for matrices whose spectra decay 
slowly (such as those matrices tested in the present section). 

3. The accuracies in Tables 1-3 appear to be proportional to m^/(^*+^) Cfe+i for 
the algorithm of Subsection 4.1, and to be proportional to m^/''**^ crk+i for 
the algorithm of Subsection 4.3, in accordance with (4.1) and (4.30). The 
numerical results reported here, as well as our further experiments, indicate 
that the theoretical bound (4.10) on the accuracy should remain valid with 
a greatly reduced constant in the right-hand side, independent of the matrix 
A being approximated. See item 6 below for a discussion of Tables 4 and 5. 

4. The timings in Tables 1-5 are consistent with (4.29), as we could (and did) 
apply the Hadamard matrices U^"^^ and V^"^^ in (5.1) to vectors via fast 
Walsh-Hadamard transforms at a cost of 0{m log(rn)) floating-point opera- 
tions per matrix-vector multiplication. 

5. The quality of the pseudorandom number generator has almost no effect on 
the accuracy of the algorithm, nor does substituting uniform variates for the 
normal variates. 

6. The accuracies in Table 5 are superior to those in Table 4, particularly when 
the k^^ greatest singiilar value (Tk of the matrix A being approximated is 
very small. Understandably, the algorithm of Subsection 4.1 would seem 
to break down when (cfe)^*^^ is less than the machine precision, while ak 
itself is not, imlike the blanczos algorithm of Subsection 4.4. When (cr/c)^*^^ 
is much less than the machine precision, while ak is not, the accuracy of 
blanczos in the presence of roundoff is similar to that of the algorithm of 
Subsection 4.1 run with a reduced i. When (fTfe)^*+^ is much greater than the 
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machine precision, the accuracy of blanczos is similar to that of the algorithm 
of Subsection 4.1 run with i being the same as in the blanczos algorithm. Since 
the blanczos algorithm of Subsection 4.4 is so tolerant of roundoff, we suspect 
that the blanczos algorithm is a better general-purpose black-box tool for the 
computation of principal component analyses, despite its somewhat higher 
cost as compared with the algorithms of Subsections 4.1 and 4.3. 
Remark 5.1. A MATLAB® implementation of the blanczos algorithm of 

Subsection 4.4 is available on the file exchange at http : //www .mathworks . com in the 

package entitled, "Principal Component Analysis." 

6. Appendix. In this appendix, we restate and prove Lemmas 3.1 and 3.2 from 
Section 3. 

The following lemma, stated earlier as Lemma 3.1 in Section 3, shows that the 
product AQQ^ of matrices A, Q, and is a good approximation to a matrix A, 
provided that there exist matrices G and S such that 

1. the columns of Q arc orthonormal, 

2. Q S is a, good approximation to (G {AA^y A)'^ , and 

3. there exists a matrix F such that \\F\\ is not too large, and FG{AA'^y A is 

a good approximation to A. 
Lemma 6.1. Suppose that i, k, I, m, and n are positive integers with k < I < 
m <n. Suppose further that A is a real mxn matrix, Q is a real nx k matrix whose 
colum,ns are orthonormal, S is a real k x I matrix, F is a real m x I matrix, and G is 
a real I x m matrix. 



\\AQQ^-A\\ <2\\FG{AA^y A- A\\+2\\F\\\\Q S - {G{AA^y Afl (6.1) 



\\AQQ^ -A\\ < \\AQQ^ -FGBQQ^W + WFGBQQ^ -FGB\\ 

+ \\FGB-A\\. (6.3) 

First, we provide a bomid for \\AQ — FG BQ Q^\\. Clearly, 



Then, 



Proof. The proof is straightforward, but tedious, as follows. 
To simplify notation, we define 



B = {AA'^yA. 



(6.2) 



We obtain from the triangle inequality that 



WAQQ'^-FGBQQ'^W < \\A-FGB\\ \\Q\\ \\Q'^\\. 



(6.4) 



It follows from the fact that the columns of Q are orthonormal that 



WQW < 1 



(6.5) 



and 



IIQ^II < 1- 



(6.6) 



Combining (6.4), (6.5), and (6.6) yields 



\\AQQ'^ -FGBQQ'^W < \\A-FGB\\. 



(6.7) 
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Next, we provide a bound for \\F G B Q Q'^ - F G B\\. Clearly, 

WFGBQQ'^ -FGB\\ < \\F\\ \{GBQQ^ -GB\\. (6.8) 
It follows from the triangle inequality that 

WGBQQ'^ -GB\\ < WGBQQ'^-S'^Q'^QQ'^W 

+ 115^ Q - g^ii + 115^ -gb\\. (6.9) 

Furthermore, 

WGBQQ'^ -S^Q'^QQ'^W < - ||Q|| ||Q'^||. (6.10) 
Combining (6.10), (6.5), and (6.6) yields 

Also, it follows from the fact that the columns of Q are orthonormal that 

Q^Q = 1. (6.12) 

It follows from (6.12) that 

WS'^Q'^QQ'^ -S'^Q'^\\=0. (6.13) 
Combining (6.9), (6.11), and (6.13) yields 

\\GBQQ^ -GB\\ <2\\S^Q^ -GB\\. (6.14) 
Combining (6.8) and (6.14) yields 

WFGBQQ'^ -FGB\\ <2\\F\\ \\S'^ Q'^ - G B\\. (6.15) 
Combining (6.3), (6.7), (6.15), and (6.2) yields (6.1). □ 

The following lemma, stated earlier as Lemma 3.2 in Section 3, shows that, for 
any positive integer i, matrix A, and matrix G whose entries arc i.i.d. Gaussian 
random variables of zero mean and unit variance, with very high probability there 
exists a matrix F with a reasonably small norm, such that FG (AA^y A is a good 
approximation to A. This lemma is similar to Lemma 19 of [29]. 

Lemma 6.2. Suppose that i, j , k, I, m, and n are positive integers with j < k < 
I < m < n. Suppose further that A is a real m x n matrix, G is a real I x m matrix 
whose entries are i.i.d. Gaussian random variables of zero mean and unit variance, 
and P and 7 are positive real numbers, such that the j*'* greatest singular value <jj of 
A is positive, 7 > 1, 



$ = 1 - 



I / „ \ '-J+i 



V27r(/-i + l) + 



1 / 2 ^ max(m — fe, I) 



4 (72 - 1) ^J'K max(m - k, I) 7^ V '^^^ \ 

1 / 272 



4(72-1) VttT^ Ve^ 



(6.16) 
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is nonnegative. 

Then, there exists a real m x I matrix F such that 

\\FG{A A^y A-A\\< V2|2 (3^^^ + l a^+i 



/ \ 4i 

/ 0"fe+l 



+ y2/ max(m-A;,0/32 7' (^-^j + 1 f^fe+i (6.17) 
and 

\\F\\ < (6.18) 

with probability not less than $ defined in (6.16), where aj is the j*'' greatest singular 
value of A, aj+i is the {j + 1)** greatest singular value of A, and <Tk+i is the {k + 1)** 
greatest singular value of A. 

Proof. We prove the existence of a matrix F satisfying (6.17) and (6.18) by 
constructing one. 

We start by forming an SVD of A, 

A = U'£V'^, (6.19) 

where ?7 is a real unitary m x m matrix, S is a real diagonal m x m matrix, and V 
is a real nx m matrix whose columns are orthonormal, such that 

Ep,p = cTp (6.20) 

for p = 1, 2, . . . , m — 1, m, where Sp^p is the entry in row p and cohimn p of S, and 
CTp is the p*^ greatest singular value of A. 

Next, we define auxiliary matrices H. R, F, S, T, 0, and P. We define H to be 
the leftmost / x j block of the I x m matrix GU, R to be the I x {k — j) block of 
G U whose first column is the (fc + I)'** column of G U, and F to be the rightmost 
I X [m — k) block oi GU, so that 

GU = { H \ R\T ) . (6.21) 

Combining the fact that U is real and unitary, and the fact that the entries of G 
are i.i.d. Gaussian random variables of zero mean and unit variance, we sec that the 
entries of H are also i.i.d. Gaussian random variables of zero mean and unit variance, 
as are the entries of R, and as are the entries of F. We define H^~^^ to be the real 
j X I matrix given by the formula 

H^-^^ = {H^ H)-^ (6.22) 

{H'^ H is invertible with high probability due to Lemma 2.7). We define 5 to be the 

leftmost uppermost j x j block of E, T to be the (k — j) x (fc — j) block of E whose 
leftmost uppermost entry is the entry in the (j + I)''' row and (j + I)''' column of E, 



(6.23) 



(m - 


-k) 


X (m 


' S 










T 





^ 





e J 
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We define P to be the real m x I matrix whose uppermost j x I block is the product 

S'~^* H'^~^\ whose entries arc zero in the (k—j) x I block whose first row is the {j + iy^ 
row of P, and whose entries in the lowermost {m — k) x I block are zero, so that 



P = 







V" 







(6.24) 



Finally, we define F to be the m x I matrix given by 
F=UP=U 







V 



(6.25) 



Combining (6.22), (2.1), the fact that the entries of H are i.i.d. Gaussian random 
variables of zero mean and unit variance, and Lemma 2.7 yields 



^f(-i) 



with probability not less than 



1 - 



V27r(/-j + l) V(^-i + l)/3 



(6.26) 



(6.27) 



Combining (6.25), (6.26), (6.23), (6.20), the fact that S is zero off its main diagonal, 

and the fact that U is unitary yields (6.18). 

We now show that F defined in (6.25) satisfies (6.17). 
Combining (6.19), (6.21), and (6.25) yields 



FG{AA^y A- A = V 







7 



( I i? I r ) s^' - 1 1 sy"^. (6.28) 



Combining (6.22) and (6.23) yields 



7 





1 r ) s^' - 1 j E 






5-2»£^(-l)^7^2i+l 


^-2i^(-l)pQ2,+ l \ 





-T 





V 





-e / 



(6.29) 



Furthermore, 






S-'' 




S-'' 







-T 











-e / 



< 



^-2i j^(-l)^2,2i+l ^-2i jj(-l) p q2»+1 



IITI 



liei 



(6.30) 
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Moreover, 







-1) ^y2i+l 


< 




^(-1) 






i)re2i+i 


< 


\s-'f 





and 



Combining (6.23) and (6.20) yields 



\\R\\ \\Tt+' (6.31) 

|r||||ef^+\ (6.32) 

IS-HI < — , (6.33) 

||T||<a,+i, (6.34) 

||e||<afe+i. (6.35) 

Combining (6.28)-(6.35) and the fact that the columns of U are orthonormal, as are 
the columns of V, yields 



and 



\FG{AA^y A- Af < 



+ 



\\m' 



^(-1) 



2 / f^J + l 



1 (<^. + l)' 



2 / '^k+l 



+ 1 (afe+i)2. (6.36) 



Combining Lemma 2.6 and the fact that the entries of R arc i.i.d. Gaussian 
random variables of zero mean and unit variance, as are the entries of F, yields 

^ (6.37) 

(6.38) 



\B\\ < V2l 7 



and 

with probability not less than 
1 



jr|| < V2 max(m - k, I) 



27^ 

4 (72 - 1) ^TT max(m - k, I) 72 V e^'"^ 



2 \ m.ax(m — fc, /) 



27' 



4 (72 - 1) v';^!^ VeT' 

Combining (6.36), (6.26), (6.37), and (6.38) yields 

4i \ 



T • (6.39) 



\FG{AA'yA-A\\^ < 2/2/32^ 



2 o2 „,2 / "'i+l 



+ 1 ('^.■+1)' 



+ l^l max(m - k, I) 7^ j + 1 j {^k+if (6.40) 

with probability not less than $ defined in (6.16). Combining (6.40), the fact that 
CTj+i < cTj, and the fact that 

VxTy<V^ + y^ (6.41) 
for any nonnegative real numbers x and y yields (6.17). □ 
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Table 1 


: Five-step algorithm of Subsection 4.1 
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Table 2: Five-step algorithm of Subsection 4.1 
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Table 3: Five-step algorithms of Subsections 4.1 and 4.3 
(parentheses around i designate Subsection 4.3) 
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Table 4: Five-step algorithm of Subsection 4.1 
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Table 5: Five-step algorithm of Subsection 4.4 
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Table 6: Pivoted Qii-decomposition 
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Figure 1: Singular values with m = 512, n = 1024, 
and CTfe+i = .001 
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